Tree Topological Features for Unlexicalized Parsing
نویسندگان
چکیده
As unlexicalized parsing lacks word token information, it is important to investigate novel parsing features to improve the accuracy. This paper studies a set of tree topological (TT) features. They quantitatively describe the tree shape dominated by each non-terminal node. The features are useful in capturing linguistic notions such as grammatical weight and syntactic branching, which are factors important to syntactic processing but overlooked in the parsing literature. By using an ensemble classifierbased model, TT features can significantly improve the parsing accuracy of our unlexicalized parser. Further, the ease of estimating TT feature values makes them easy to be incorporated into virtually any mainstream parsers.
منابع مشابه
Tree Topological Features for Unlexicalized Parsing (Coling 2010)
As unlexicalized parsing lacks word token information, it is important to investigate novel parsing features to improve the accuracy. This paper studies a set of tree topological (TT) features. They quantitatively describe the tree shape dominated by each non-terminal node. The features are useful in capturing linguistic notions such as grammatical weight and syntactic branching, which are fact...
متن کاملThree-Dimensional Parametrization for Parsing Morphologically Rich Languages
Current parameters of accurate unlexicalized parsers based on Probabilistic ContextFree Grammars (PCFGs) form a twodimensional grid in which rewrite events are conditioned on both horizontal (headoutward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrasestructures are often shallow, there are additional morphological factors that g...
متن کاملIncreasing Accuracy While Maintaining Minimal Grammars in Cky Parsing
Significant work in both lexicalized and unlexicalized parsing has been done in the past ten years. F1 measures of accuracy of over 90% have been achieved (Bikel, 2005), and linguistic notions of lexical dependencies and using head words have been harnessed to create significant improvements in probabilistic CFG note, however, that many of the techniques for improving lexicalized parsing create...
متن کاملAccurate Unlexicalized Parsing for Modern Hebrew
Many state-of-the-art statistical parsers for English can be viewed as Probabilistic Context-Free Grammars (PCFGs) acquired from treebanks consisting of phrase-structure trees enriched with a variety of contextual, derivational (e.g., markovization) and lexical information. In this paper we empirically investigate the applicability and adequacy of the unlexicalized variety of such parsing model...
متن کاملSentence Realization with Unlexicalized Tree Linearization Grammars
Sentence realization, as one of the important components in natural language generation, has taken a statistical swing in recent years. While most previous approaches make heavy usage of lexical information in terms of N -gram language models, we propose a novel method based on unlexicalized tree linearization grammars. We formally define the grammar representation and demonstrate learning from...
متن کامل